401-2_Homework4

Author

Cat Dang Ton

Published

April 26, 2024

Topic description

The United States’ construction of the single-income nuclear family as the primary unit of social and material support suggests that the death of a spouse can leave widows vulnerable to financial hardship. Even as insurance and social assistance structures exist as protective measures, certain groups and certain issues can still fall outside of the contours of coverage. In addition to the possible loss of income is the cost of handling the death itself—funeral expenses, legal fees, and expenses for any illness that led to death. In this exploratory analysis, I examine the extent to which widows are unprotected from the cost of their spouse’s death by the existing social insurance/social assistance structures (social security, life insurance, etc.), and what makes a widow more likely to be unprotected.

Data and variables

I am using data from the Health and Retirement Survey, a nationally-representative survey of U.S. older adults. The sample consists of respondents to the survey’s subsection on Widowhood and Divorce, who experienced the death of a spouse between 2020 and 2022. This subsection consists of questions about the financial impacts of the death, such as changes in income, social assistance and work hours, changes in insurance coverage, death expenses, and so on.

Dependent variable: whether or not the widow had to sell assets, withdraw money that normally would not be touched, get help from a relative, or from a church or other institution, or do anything else special to find the money to cover the deceased spouse’s death expenses. (Yes = 1, No = 0).

Independent variables:

  1. Widow’s demographic characteristics: Gender, race & ethnicity, foreign-born status, disability status
  2. Total death expenses that are NOT covered by insurance or the deceased spouse’s estate (in thousands of dollars)
    1. Where exact dollar estimates are unavailable, range estimates were recorded. I calculated the mean of range estimates. For range estimates that had been coded as “Above $10000”, I substituted them with the mean + 1 standard deviation.
  3. Widow’s employment status (currently employed, retired, homemaker)
library(pacman)
p_load(tidyverse, broom, haven, skimr, janitor, marginaleffects, lmtest, modelsummary, flextable)

# Turn off scientific notation
options(scipen = 100)
# refresh environment
rm(list = ls()) 

# set working directory to project directory
# setwd(here::here("FINAL_PROJECT_HRS_SPOUSAL_DEATH"))
# read data in relation to working directory
df <- read.csv("../Input/HRS_widows_employ_tracker_cleaned.csv")

1C.

# recode death expenses for easier interpretation
df$deathexpense_1k <- df$deathexpense_usd/1000 

# Estimate a logistic regression model
model_logit <- glm(deathexpense_special ~ foreign + female + disabled + employed + homemaker + black_nh + hisp_allraces + non_bwh + deathexpense_1k, 
  data = df, 
  family = binomial(link = "logit")) #specify that we want logistic regression(link = "logit")) 
modelsummary(model_logit,
            coef_map = coef,
            gof_map = gof,
            fmt = 4, # 4 decimal digits (to avoid rounding to 0)
            stars = T, # significance stars
            exponentiate = T,
            title = "Logistic Regression Results for Resorting to Special Means to Cover Death Expenses (odds ratios)",
            notes = c("SE in parentheses",
                        "Ref: US-born, male, non-disabled, retired, white")) 
tinytable_gjyqq8of8lkc0mt5vily
Logistic Regression Results for Resorting to Special Means to Cover Death Expenses (odds ratios)
(1)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
SE in parentheses
Ref: US-born, male, non-disabled, retired, white
Foreign-born 1.7656**
(0.3664)
Female 1.1033
(0.1637)
Disabled 1.8180**
(0.3803)
Currently employed 1.4566*
(0.2430)
Homemaker 0.9990
(0.2400)
NH Black 1.0505
(0.2066)
Hispanic (of all races) 1.2600
(0.2675)
Other nonwhite 0.9550
(0.3259)
Death expenses ($1000s) 1.0279**
(0.0093)
Constant 0.1874***
(0.0280)
Num.Obs. 1302
BIC 1476.2

Interpretation of variables pre-interaction:

Holding all other variables constant,

  • The odds of resorting to special means to cover death expenses are 10.33% higher for female widows compared to male widows.
  • The odds of resorting to special means to cover death expenses are 5.05% higher for non-Hispanic Black widows and 26% higher for Hispanic widoes compared to non-Hispanic White widows.

1D.

# Calculate the average marginal effects
ame <- avg_slopes(model_logit)

print(ame)

            Term Contrast  Estimate Std. Error        z Pr(>|z|)   S    2.5 %
 black_nh           1 - 0  0.008853     0.0356  0.24855  0.80371 0.3 -0.06096
 deathexpense_1k    dY/dX  0.004902     0.0016  3.05937  0.00222 8.8  0.00176
 disabled           1 - 0  0.117890     0.0446  2.64507  0.00817 6.9  0.03053
 employed           1 - 0  0.070763     0.0329  2.15262  0.03135 5.0  0.00633
 female             1 - 0  0.017349     0.0259  0.66933  0.50328 1.0 -0.03345
 foreign            1 - 0  0.112120     0.0444  2.52691  0.01151 6.4  0.02516
 hisp_allraces      1 - 0  0.042971     0.0411  1.04617  0.29548 1.8 -0.03753
 homemaker          1 - 0 -0.000171     0.0428 -0.00399  0.99682 0.0 -0.08400
 non_bwh            1 - 0 -0.008117     0.0595 -0.13640  0.89150 0.2 -0.12475
  97.5 %
 0.07867
 0.00804
 0.20525
 0.13519
 0.06815
 0.19908
 0.12347
 0.08366
 0.10851

Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high 
Type:  response 

Holding all other variables constant,

  • Female widows’ probability of resorting to special means to cover death expenses are 1.73 percentage points higher compared to male widows.
  • NH Black widows’ probability of resorting to special means to cover death expenses are 0.885 percentage points higher compared to NH White widows, and Hispanic widows’ probability of resorting to special means to cover death expenses are 4.29 percentage points higher compared to NH White widows.

1E.

E. Now add an interaction term between two of the variables. Is the interaction term (or terms if you have factors with multiple levels) statistically significant?

model_interact <- glm(deathexpense_special ~ foreign + female + female*black_nh + female*hisp_allraces + disabled + employed + homemaker + black_nh + hisp_allraces + non_bwh + deathexpense_1k, 
  data = df, 
  family = binomial(link = "logit")) #specify that we want logistic regression(link = "logit")) 

modelsummary(model_interact,
            coef_map = coef,
            gof_map = gof,
            fmt = 4, # 4 decimal digits (to avoid rounding to 0)
            stars = T, # significance stars
            exponentiate = T,
            title = "Logistic Regression Results for Resorting to Special Means to Cover Death Expenses (odds ratios)",
            notes = c("SE in parentheses",
                        "Ref: US-born, male, non-disabled, retired, white")) 
tinytable_quqwvt2qaxd3l0su2kky
Logistic Regression Results for Resorting to Special Means to Cover Death Expenses (odds ratios)
(1)
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
SE in parentheses
Ref: US-born, male, non-disabled, retired, white
Foreign-born 1.7652**
(0.3665)
Female 1.1147
(0.1960)
Disabled 1.8170**
(0.3811)
Currently employed 1.4553*
(0.2429)
Homemaker 0.9935
(0.2395)
NH Black 1.1428
(0.4080)
Hispanic (of all races) 1.2157
(0.4532)
Other nonwhite 0.9548
(0.3259)
Death expenses ($1000s) 1.0281**
(0.0094)
Female x NH Black 0.8891
(0.3745)
Female x Hispanic 1.0475
(0.4312)
Constant 0.1860***
(0.0304)
Num.Obs. 1302
BIC 1490.4

The interaction term is not statistically significant.

1F.

# First off, plotting predicted probability with no interaction
plot_predictions(model_interact,
                 condition = "female") 

# Plot predicted probability with interaction effect
plot_predictions(model_interact,
                 condition = c("female", "black_nh")) 

# Plot predicted probability with interaction effect
plot_predictions(model_interact,
                 condition = c("female", "hisp_allraces")) 

There isnt an interaction based on these graphs. The standard errors are too big.

1G.

modelsummary(list("Model 1" = model_logit, "Model 2" = model_interact),
            coef_map = coef,
            gof_map = gof,
            fmt = 4, # 4 decimal digits (to avoid rounding to 0)
            stars = T, # significance stars
            exponentiate = T,
            title = "Logistic Regression Results for Resorting to Special Means to Cover Death Expenses (odds ratios)",
            notes = c("SE in parentheses",
                        "Ref: US-born, male, non-disabled, retired, white")) 
tinytable_strcpuo5q85w0mgxeqeq
Logistic Regression Results for Resorting to Special Means to Cover Death Expenses (odds ratios)
Model 1 Model 2
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
SE in parentheses
Ref: US-born, male, non-disabled, retired, white
Foreign-born 1.7656** 1.7652**
(0.3664) (0.3665)
Female 1.1033 1.1147
(0.1637) (0.1960)
Disabled 1.8180** 1.8170**
(0.3803) (0.3811)
Currently employed 1.4566* 1.4553*
(0.2430) (0.2429)
Homemaker 0.9990 0.9935
(0.2400) (0.2395)
NH Black 1.0505 1.1428
(0.2066) (0.4080)
Hispanic (of all races) 1.2600 1.2157
(0.2675) (0.4532)
Other nonwhite 0.9550 0.9548
(0.3259) (0.3259)
Death expenses ($1000s) 1.0279** 1.0281**
(0.0093) (0.0094)
Female x NH Black 0.8891
(0.3745)
Female x Hispanic 1.0475
(0.4312)
Constant 0.1874*** 0.1860***
(0.0280) (0.0304)
Num.Obs. 1302 1302
BIC 1476.2 1490.4

The model with the lowest BIC is preferred.

Since the BIC for Model 1 is lower, which means the preferred model is the model without the interaction term.

A BIC difference of of at least 10 qualifies a model to be strongly preferred over another. Thus, model 1 is strongly preferred over model 2.

1F.

The interaction term does not statistically significantly improve the explanatory power of the model. Average marginal effects graphs indicate there is no clear interaction effect between the gender and race variables. Additionally, the models’ BIC scores indicate that the model without the interaction term is strongly preferred. Therefore, the logit model without the interaction term provides a sufficient understanding of the data.

Question 2

2A.

What would you say are the main substantive arguments of this article?

The article unseats the theory that there is such a thing as a “ghetto subculture” that is uniform yet unregulated (permissive) and merely oppositional to the middle-class mainstream culture. Looking at the variations of youth beliefs about pregnancy, relationship scripts and sexual activity within and between neighborhoods, the authors argue that, compared to middle-class counterparts, poor urban residents live and cohabit among a wider array of competing cultural scripts. This paradoxically regulates their behavior more–they are less likely to realize their own relationship ideals in their actual relationship with others.

2B.

  • Outcome variable: sexual activity (predicted probability)
  • Coefficients are in log odds. Evidence is in p.357, second-to-last paragraphs, where the authors discussed the exponentiates of the coefficients as odds ratios.
  • Standard errors are shown in parentheses.

2C.

2D.

If there were no interactions, the line representing the 4th quartile would not intersect with the lines representing the other quartiles.

2E.

Suppose this omitted variable is whether or not the respondent identifies as asexual. If asexuality is negatively correlated with sexual activity and negatively correlated with the belief that pregnancy isnt all that bad at this stage of their lives, the coefficient for the pregnancy frame variable could be upwardly biased. The coefficient would be larger than it would be if the omitted variable were included in the model.

2F.

The addition of this omitted variable to the model would likelly dampen the magnitude of the pregnancy frame coefficient.

2G.

Variable names are so abstracted from the substantive matters behind “neighborhood cultural heterogeneity” (which I understand to be the diversity of beliefs about the consequences of teen sex in a given neighborhood) and neighborhood disadvantage that the interaction tables and plots were unintelligible to non-expert readers. I feel confused about whether “cultural heterogeneity” refers to the variety of social groups/lifestyles in a neighborhood, opinions on pregnancy and romantic scripts, or in the distance between ideal and actual sexual activity.

The limiting of gender to a control variable, as regards the belief in the statement “getting someone pregnant/getting pregnant would have negative consequences for me”, as well as the omission of gender in the romantic relationship scripts model (due to sample density issues–see authors’ footnote 26), seems to be an important limitation of the study. It would be worth interrogating whether gender differences in relationship scripts and in beliefs about pregnancy’s consequences would correlate to gender differences in reporting sexual activity. Depending on gendered scripts, students could downplay or have different definitions of what activity counts as “sexual intercourse”, thereby affecting the dependent variable.